home *** CD-ROM | disk | FTP | other *** search
- Path: keats.ugrad.cs.ubc.ca!not-for-mail
- From: c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku)
- Newsgroups: comp.lang.c
- Subject: Re: Tradition or what?
- Date: 15 Mar 1996 21:07:00 -0800
- Organization: Computer Science, University of B.C., Vancouver, B.C., Canada
- Message-ID: <4idi9kINNqbq@keats.ugrad.cs.ubc.ca>
- References: <4g0elg$mdr@redstone.interpath.net> <4hqkso$gno@news1.mnsinc.com> <1996Mar15.163722.6685@pyra.co.uk> <4id0t7$9nr@news1.mnsinc.com>
- NNTP-Posting-Host: keats.ugrad.cs.ubc.ca
-
- In article <4id0t7$9nr@news1.mnsinc.com>,
- Szu-Wen Huang <huang@mnsinc.com> wrote:
- >: May I respectfully suggest that #defines would be clearer and enumerations
- >: clearer still ?
- >
- >No, it won't be, because the states don't always have logical
- >significance. For example, the parser state machine for a simple
- >regular expression [A-Za-z]*=[0-9]* might look:
- >
- >switch (state) {
- > case 0: if (isalpha(input)) ...
- > else if (input == '=') state = 1;
- > break;
- > case 1: if (...
- >}
- >
- >In this case, I don't believe clarify is an issue, first of all,
- >because without the state diagram, the code is next to useless
- >to a reader (these code tend to be long). A practical parser
- >would have many more states, and #defining those would both be
- >a bother and serve no practical purpose in terms of readability.
- >In this case, what would you name the states?
-
- This is clearly a valid point. But the real source code in this case is the
- diagram. In effect, you have compiled the regular expression manually, first
- into a state diagram (perhaps going from a non-deterministic one to a
- deterministic one via a subset construction), and then a table.
-
- The real ``maintainable'' info is in the diagram; the switch statement is just
- ``compiled'' code. I'd tend to use a scanner generator, which can compute the
- states and number them just as well as I can!
-
- Defining names for the states here is useless, because 0, 1, 2... already
- serve as names, effectively. In particular, after you convert a NFA to a DFA,
- much of the meaning is lost. The original states of the non-deterministic FA
- may have some meaning. They correspond to the regular expression in a
- straightforward way. But after the subset construction to produce a
- deterministic FA, the states of the DFA don't intuitively correspond to
- anything. For instance (fooa|foob) will compress such that as "foo" is read,
- both branches are matched simultaneously by the same sequence of three state
- changes, and only the reading of a subsequent 'a' or 'b' finally selects a
- different state.
-
- To blindly follow a rule such as "all constants must be #defined or enumerated
- with symbolic names" is pure buffoonery, as is any other "rule of thumb".
-
- Rules of thumb are not powerful enough to express anything meaningful in
- computer science.
- --
-
-